Abstract
Paper als short communication / technical note?
Hier ein anderes Intro schreiben … Ansatz: Neben der Validierung von Messgeräten spielt auch die Validierung von Modellergebnissen eine Rolle!
Therefore, when analyzing the validity of curve data, the entire length of the curve should be accounted for. This can be accomplished e. g. by applying statistics separately to all points of the curve [Pini et al. (2019). The pointwise approach allows for an error representation over the entire domain, but ignores the fact, that the sampling points within a curve are locally correlated (Deluzio & Astephen, 2007; Pataky, 2010). From a statistical point of view, this implies that the random error component is likely underrepresented when the (nonzero) covariance term of adjacent sampling points is not accounted for (Lenhoff et al., 1999).
From a statistical point of view, this implies that the random error component is likely underrepresented when the (nonzero) covariance term of adjacent sampling points is not accounted for (Lenhoff et al., 1999). In addition, most traditional point statistics are either not suitable (e.g. statistical tests of mean differences), or are only suitable to a limited extent (e.g. product-moment correlation) to quantify the agreement between two measurement systems (Bland and Altman, 1986).
One possibility to address the problem of pointwise analysis is to treat entire curves as (functional) objects rather than as a series of unconnected points.
Sutherland et al. (1988) introduced a method in which continuous signals are reconstructed using Fourier series and bootstrapped to estimate of the variation of the sample mean and thus estimate confidence intervals. Lenhoff et al. (1999) illustrated their method on a generic gait analysis data set (comprised of joint angle curves) and compared it to pointwise Gaussian intervals. The authors found that Sutherland et al. bootstrap bands provide appropriate coverage for continuous curve gait data (86% coverage for a targeted coverage of 90%) while pointwise Gaussian bands are shown to provide inadequate coverage (54% for a targeted coverage of 90%).
Røislien et al. (2012) presented a similar approach to calculate a functional version of the the Limits of Agreement (LoA) method suggested by Bland & Altman. They first calculate functional difference curves which are then bootstrapped to obtain 95% confidence intervals. The approach accounts for the autocorrelation characteristic in the signal and at the same time allows for an intuitive description of the systematic and random error terms in the original sampling space (e.g. actual degrees). Røislien et al., however, rely on a number of strong model assumptions, which are often not or only partially fulfilled in empirical data sets:
The construction of symmetric prediction bands as described in Røislien et al. [10] relies on Gaussian assumptions. Non-normality, however, is often present when e.g. sensor values are truncated because of limited range or resolution, or because values drift over time.
This assumption is violated considerably in many validation data sets [12]. E.g., Atallah et al. [13] found that the agreement between an ear-worn accelerometer and a force-plate instrumented treadmill was associated with gait speed when estimating gait cycle duration. When validating a foot-worn gyroscope for treadmill running, Koska & Maiwald [14] observed a linear error trend that was associated with foot strike behavior.
Hier werden arithmetische mean curves verwendet, the variability of which will reflect variation between means. In many cases, however, it is the variability of individual values that is relevant, and the limits will be too narrow by a factor of approximately the square root of the number of curves within the sample.
Ein weiteres Problem ist, dass die Methode keine messwiederholten Daten berücksichtigt. Da sich messwiederholte Kurven i. d. R. ähnlicher sind als unabhängige Kurven, besteht die Gefahr die Streuung des Messsystems zu überschätzen.
The principal object of this paper was to demonstrate a method for constructing continuous prediction intervals from curve data that addresses the aforementioned issues (analyzing the whole curve, including repeated measures, model assumptions (normal distribution, equal variances)), and thus enables a more appropriate characterization of the agreement between measurement curves on the basis of easy to interpret error bands (i.e. LoA in the actual measurement unit).
We use synthetic time series data sets containing typical curves of biomechanical measurement systems to illustrate the method and compare the resulting prediction bands to pointwise Bland & Altman parameters and LoA constructed similar to the method described in Roislien et al. (2012). For model validation, the respective LoA are cross validated using a leave-one-out approach.
FLoA derived by different methods are compared
To compare different methods for constructing continuous LoA, several sets of smooth curve data were simulated across a range of common biomechanical signals and error characteristics. The quality of the curves is similar, as the paper is primarily developed for a validation context in which the same movement process is analyzed using similar methods (two measurement devices or models). For the same reason, the curves do not necessarily correspond to real-world signals as we were more concerned with illustrating the respective methods and error characterstics.
All curves were modeled as Fourier series from additively superimposed sine functions. Each data set contains n.strides = 10 curves from n.subj = 11 fictitious test subjects. Each curve consists of t = 101 data points.
n.subj <- 11
n.ts <- 100
t <- seq(0, 100)
for (subject.idx in 1:n.subj) {
# Subjectwise parameters
offset.mean <- runif(1, min = -0.5, max = 0.5)
a.sd <- runif(1, min = 0.05, max = 0.15)
b.sd <- runif(1, min = 0.0001, max = 0.002)
for (stride.idx in 1:(n.strides)) {
a1.device1 <- rnorm(1, mean = 3, sd = a.sd)
a1.device2 <- rnorm(1, mean = 3, sd = a.sd)
a2.device1 <- 0.08
a2.device2 <- 0.08
b1.device1 <- rnorm(1, mean = 0.06, sd = b.sd)
b1.device2 <- rnorm(1, mean = 0.06, sd = b.sd)
b2.device1 <- rnorm(1, mean = 0.58, sd = b.sd)
b2.device2 <- rnorm(1, mean = 0.58, sd = b.sd)
c <- 2
sine.1 <- a1.device1 * sin(b1.device1 * t) ^ (c + 3)
sine.2 <- a2.device1 * sin(b2.device1 * t)
sine.3 <- a1.device2 * sin(b1.device2 * t) ^ (c + 3)
sine.4 <- a2.device2 * sin(b2.device1 * t)
offset <- rnorm(1, offset.mean, 0.05)
# Model (i). (ii), or (iii)
# ...
# ...
# Model (i)
device.1 <- sine.1 + sine.2
device.2 <- offset + sine.3 + sine.4
# ...
trend <- (1 / 100000) * seq(0.5, 50.5, 0.5)^3
# Model (ii)
device.1 <- sine.1 + sine.2
device.1 <- offset + sine.3 + sine.4 + trend
# ...
a1.device2 <- rnorm(n = 1,
mean = rweibull(1, shape = 1.5, scale=1) - factorial(1/1.5), # factorial() used to center around 0
sd = a.sd)
# ...
sine.3 <- a1.device2 * sin(b1.device2 * t) ^ (c + 3)
# ...
# Model (iii)
device.1 <- sine.1 + sine.2
device.2 <- offset + sine.3 + sine.4
Leave-one out method to estimate the achieved coverage
Prediction intervals are calculated from size.dataset = number.subjects - 1 and then validated against the left out subjects curves for five different methods:
FLoARCB_v1 : n = length(subjects) random strides from all strides
FLoARCB_v2 : One random stride per subject
FLoARCB_v3 : Fetch a SINGLE random stride from all strides
FLoARoislien : Roislien approach (Get one random stride from each subject ONCE and boot- strap the resulting sample (of length (n=length(subjects))
FLoAPoint : Pointwise B & A Limits of Agreement
Displayed are boxes for five different methods:
Coverages are displayed for two different scenarios:
a: Percentage of entirely covered curves (all points within the limits): Each box consists of n~ = 11 values (one percentage for each subject).
b: Mean percentage of curve points within the respective FLoA: The fraction of points along a curve within the limits is calculated and averaged across all curves of a subject.
lwr.bnd <- floa["lower", ]
upr.bnd <- floa["upper", ]
# ...
for (stride.idx in stride.indices){
curve <- subset(device.diff, strideID == stride.idx)
# Compare difference curves with upper and lower boundaries
below.thresh <- curve$value < lwr.bnd
above.thresh <- curve$value > upr.bnd
points.outside <- sum(above.thresh) + sum(below.thresh)
# Check if the entire curve is within the limits
if (points.outside > 0) {
outside <- outside + 1
}
}
coverage <- 1 - (outside / n.strides)
lwr.bnd <- floa["lower", ]
upr.bnd <- floa["upper", ]
# ...
for (stride.idx in stride.indices){
curve <- subset(device.diff, strideID == stride.idx)
# Compare difference curves with upper and lower boundaries
below.thresh <- curve$value < lwr.bnd
above.thresh <- curve$value > upr.bnd
# Mean percentage of points outside the limits
inside <- below.thresh == above.thresh
inside.thresh.perc <- c(inside.thresh.perc,
mean(sum(inside[TRUE]) / length(curve$value)))
}
It might be beneficial to work with functional data as suggested by Lenhoff et al. (1999) or Roislien et al. (2012) when e. g. interpolating data to a standardized length. This approach, however, wasn’t necessary in this paper, since the synthetic data were created from functions and were designed to all have equal length (101 data points).
Wir empfehlen die Daten zu plotten. Am Ende ist jeder Datensatz bzw. jede Fragestellung unterschiedlich und ein pauschaler coverage value wird der Qualität des Messgeräts (in Bezug auf die jeweilige Fragestellung möglicherweise nicht gerecht).